README for Human-Annotated Posts Dataset
Submitted to ACL 2026 Findings | Harvard Dataverse

Overview
--------
This dataset contains 1,621 rows and 48 columns of human-annotated Facebook political posts from the Philippines. It includes data for analyzing post content, sentiment, engagement, misinformation, and other relevant characteristics relevant to the study of political communication and information quality in the Philippine online information ecosystem.

This resource is made available in conjunction with a paper submitted to ACL 2026 Findings:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

Dataset Structure
-----------------
Number of Rows: 1,621
Number of Columns: 48
Language(s): English and Filipino (Tagalog)
Source: Facebook political pages (Philippines)

Key Columns:

1. Post Metadata:
   - Post Date: The date the post was published.
   - Title: Title or headline of the post.
   - Content: Full textual content of the post (English and/or Filipino).
   - Site Name and Site URL: The source platform's name and URL.
   - Channel Information: Channel name, type, and country.

2. Classification:
   - Category: Type or category of content (e.g., News, Opinion).
   - Classification: Indicates if the post is factual or non-factual.
   - Misleading: Boolean flag for misleading content.
   - Contains Factual Error: Boolean flag for factual errors.
   - Additional markers:
     - Misrepresentation or missing context.
     - Digitally altered photos/videos.
     - Outdated information.

3. Engagement Metrics:
   - No. of Comments: Count of comments on the post.
   - Influence Score: A metric to evaluate the post's reach or importance.
   - Sentiment Score: Sentiment analysis score for the post.

4. Temporal Data:
   - Date breakdown fields:
     - YearKey, MonthOfYear, DayOfWeekName, etc.
     - QuarterOfYear and WeekOfMonth for granular temporal analysis.

5. Misinformation Flags:
   - Boolean fields indicating:
     - Whether the post contains unverified claims.
     - Whether it presents satire misinterpretable as fact.

6. Miscellaneous:
   - Voice Name and Voice URL: Information about the primary account or entity behind the post.
   - On Watchlist: Indicates if the post or account is flagged for special observation.

Usage and Applications
----------------------
This dataset is particularly suited to NLP and computational social science research. Intended applications include:

1. Misinformation Detection: Training models to classify factual and misleading content in multilingual Philippine social media.
2. Sentiment Analysis: Understanding public sentiment across political posts.
3. Media and Communication Studies: Investigating patterns of misinformation and engagement.
4. Temporal Analysis: Analyzing trends over time, such as during elections or significant political events.
5. Engagement Analysis: Evaluating the relationship between content type and user interaction.

Sample Data
-----------
| Post Date  | Title                                    | Content                        | Category | Classification | Misleading | No. of Comments | Sentiment Score |
|------------|------------------------------------------|--------------------------------|----------|----------------|------------|-----------------|-----------------|
| 2025-01-15 | Breaking News: Major Update in Election  | Details about the election...  | News     | Factual        | False      | 123             | 0.85            |
| 2025-01-14 | Viral Meme: A Joke Goes Wrong            | A joke about climate change... | Satire   | Non-Factual    | True       | 45              | -0.10           |
| 2025-01-13 | Exclusive: Insider Info on Policy Changes| Leaked document reveals...     | Opinion  | Factual        | False      | 76              | 0.50            |

Important Notes
---------------
- Data Privacy: Ensure compliance with applicable privacy laws (e.g., GDPR, Philippines Data Privacy Act) when using this dataset.

- Data Cleaning: Some fields may contain null or inconsistent values and require preprocessing.

- Misleading Flags: The Misleading column reflects human annotator judgment and may contain subjective assessments. Additional validation is recommended for research use.

Citation
--------
If you use this dataset, please cite the associated ACL 2026 Findings paper:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

The dataset is hosted on Harvard Dataverse and should also be cited independently
per the repository's citation guidelines.